Skip to content

feat: expose simdutf bindings behind cargo feature#1928

Open
bartlomieju wants to merge 3 commits intomainfrom
feat/simdutf-bindings
Open

feat: expose simdutf bindings behind cargo feature#1928
bartlomieju wants to merge 3 commits intomainfrom
feat/simdutf-bindings

Conversation

@bartlomieju
Copy link
Member

@bartlomieju bartlomieju commented Mar 11, 2026

Summary

  • Exposes the bundled simdutf (v7.7.0) API through rusty_v8 behind a simdutf cargo feature flag
  • Solves C++ symbol clashes when consumers (e.g. deno_core) try to use a separate simdutf Rust crate alongside rusty_v8
  • SIMD-accelerated Unicode validation, transcoding, and base64 operations

What's included

  • Validation: UTF-8, ASCII, UTF-16 (LE/BE), UTF-32
  • Conversion: All combinations of UTF-8, UTF-16 (LE/BE), UTF-32, Latin-1 (with and without length output)
  • Length calculation: Output length estimation for all conversion pairs
  • Counting: UTF-8/UTF-16 code point counting
  • Encoding detection: Autodetect BOM-based encoding
  • Base64: Encode/decode with configurable alphabets and chunk handling

Design

  • simdutf cargo feature → rusty_v8_enable_simdutf GN arg → RUSTY_V8_ENABLE_SIMDUTF preprocessor guard
  • Thin extern "C" wrappers in binding.cc (follows existing v8__/cppgc__ naming with simdutf__ prefix)
  • Safe Rust API layer in src/simdutf.rs — validation/length functions are safe, conversion functions are unsafe (caller provides output buffer)
  • Prebuilt binaries: CI matrix updated to produce _simdutf prebuilt artifacts (mirroring the _ptrcomp pattern), so consumers can use --features simdutf without V8_FROM_SOURCE

Test plan

  • Tests in tests/test_simdutf.rs cover:
    • UTF-8, ASCII, UTF-16LE, UTF-32 validation (valid + invalid inputs)
    • Validation with error position reporting
    • UTF-8 ↔ UTF-16LE round-trip conversion
    • Latin-1 ↔ UTF-8 round-trip conversion
    • UTF-8 ↔ UTF-32 round-trip conversion
    • Codepoint counting (UTF-8, UTF-16LE)
    • Length calculation cross-checks
    • Encoding detection
    • Base64 encode/decode round-trip (standard + URL-safe)
  • CI matrix produces prebuilt _simdutf artifacts for all major targets
  • Verify builds without the feature still work (no simdutf dependency)

🤖 Generated with Claude Code

V8 bundles simdutf (SIMD-accelerated Unicode validation/transcoding),
but consumers like deno_core can't use a separate simdutf Rust crate
alongside rusty_v8 due to C++ symbol clashes. This exposes the bundled
simdutf API through rusty_v8 behind a `simdutf` cargo feature flag.

Changes:
- Cargo.toml: add `simdutf` feature
- BUILD.gn: conditionally link simdutf dep and define RUSTY_V8_ENABLE_SIMDUTF
- build.rs: wire cargo feature to GN arg, add to prebuilt suffix
- binding.cc: add ~250 lines of extern "C" wrappers for simdutf functions
  (validation, conversion, length calculation, base64), gated by preprocessor
- simdutf.rs: safe Rust API wrapping all exposed simdutf operations
- lib.rs: register simdutf module behind cfg(feature)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@kajukitli kajukitli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not comfortable approving this as-is

biggest issue: the new simdutf feature changes the prebuilt artifact suffix (_simdutf) but the PR doesn't show any corresponding prebuilt artifacts being produced/published. that means consumers enabling the cargo feature will miss the prebuilt binary and fall back to source builds, or just break depending on how they're consuming rusty_v8.

for rusty_v8, feature-gated native code isn't just a Rust API change — it changes the binary compatibility matrix. if we add _simdutf to prebuilt_features_suffix(), we need the release/build pipeline to actually produce those artifacts too.

also, the test plan is still basically unchecked for the important parts (cargo build --features simdutf, no-clash verification, builds without feature, etc). for a 1k+ line FFI surface, that's too hand-wavy.

Copy link

@kajukitli kajukitli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not comfortable approving this as-is

biggest issue: the new simdutf feature changes the prebuilt artifact suffix (_simdutf) but the PR doesn't show any corresponding prebuilt artifacts being produced/published. that means consumers enabling the cargo feature will miss the prebuilt binary and fall back to source builds, or just break depending on how they're consuming rusty_v8.

for rusty_v8, feature-gated native code isn't just a Rust API change — it changes the binary compatibility matrix. if we add _simdutf to prebuilt_features_suffix(), we need the release/build pipeline to actually produce those artifacts too.

also, the test plan is still basically unchecked for the important parts (cargo build --features simdutf, no-clash verification, builds without feature, etc). for a 1k+ line FFI surface, that's too hand-wavy.

bartlomieju and others added 2 commits March 11, 2026 23:51
…ests

- Remove `_simdutf` from prebuilt_features_suffix() since no prebuilt
  artifacts are produced with simdutf. Instead, panic early with a
  helpful message if the simdutf feature is enabled without V8_FROM_SOURCE.
- Add tests/test_simdutf.rs with comprehensive tests covering validation,
  UTF-8/UTF-16/UTF-32/Latin-1 conversion round-trips, length calculation,
  counting, encoding detection, and base64 encode/decode.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Instead of requiring V8_FROM_SOURCE for simdutf, add CI matrix entries
to build and publish prebuilt binaries with the simdutf feature enabled.
This mirrors the existing pattern used for v8_enable_pointer_compression.

Adds debug+release builds for: x86_64-apple-darwin, aarch64-apple-darwin,
x86_64-unknown-linux-gnu, aarch64-unknown-linux-gnu.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants